The “done” and “coming soon” statuses are just to help me keep track of what I have put into my main portfolio rmd.
Github install
GitBash install
RStudio install
#created MICB425_portfolio directory on my computer
#created new repository 'MICB_portfolio' on my Github account
git init
git add .
git commit -m "comment text" #comment was 'First commit'
git remode add origin [repository url] #URL was taken from repository page on Github
git remote -v #just to check that URL was correct
git push -u origin master
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
*** ###Evidence worksheet 03 Rockstrom 2009
atmospheric aerosol loading
Systems currentl past limit values:
| System | Variable | Limit Value | Current Value |
|---|---|---|---|
| Climate Change | CO2 | 350ppm | 418ppm (387ppm at time of article) |
| Climate Change | Radiative Forcing | 1Wm-2 | 1.5W-2 |
| Biodiversity Loss | Species Loss Rate | 10x Background Rate | 100-1000x Background Rate |
| Nitrogen Cycle | N2 converted to NO3 or NH4 | 35x106 ton/yr | 120x106 ton/yr |
I thought it was quite straight forward. It was pretty haunting to see their quoted atmospheric CO2 concentration and think “Did they get that wrong?” just to realize this was written 9 years ago and we have already pushed another 30ppm past the limit propposed here.
Aquatic - The majority of prokaryotic life is found in the open ocean. They have a short turnover time and therefore a high cellular productivity, which means that mutations and other rare genetic events are most likely to occur here than other habitats.
Subsurface - Major habitat for prokaryotes, with most of the subsurface biomass supported by organic matter deposited from the surface.
Soil - Major reservoir of organic carbon; prokaryotes are essential in soil decomposition
| Environment | Aquatic | Subsurface | Soil |
|---|---|---|---|
| Total abundance | \(1.18*10^{29}\) | \(3.8 x 10^{30}\) | \(2.556*10^{29}\) |
Density: \(5*10^5\) cells/mL
Cyanobacteria: \((4*10^4 cells/ml)/(5*10^5 cells) * 100 = 8%\)
Cyanobacterium such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the prokaryotic cell abundance in the upper 200m, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time, resulting in \(8.2 * 10^{29}\) cells/year.
Autotrophs - bacteria that produce their own food, primarily using energy from the sun. In this paper only marine autotrophs are considered, and the overwelming majority of them is said to be Prochlorococcus.
Heterotrophs - use organic carbon as an energy source and carbon source. They are the overwhelming majority of cells on Earth.
Lithotrophs - prokaryotes that gain energy from something other than organic carbon or sunlight. They are said to be found in small amounts in the subsurface and that organic carbom still sustains most life in the subsurface.
Cells/year = Population Size * (turnover/year)
\(2.9*10^{27}cells * (365(days/year)/1.5days) = 7.1*10^{29}cells/year\)
Tectonic movement along with photochemical reactions in the atmosphere allow for mixing and partitioning of chemical substrates on Earth.
Biogeochemical(biotic): (Redox)
Although there is enormous genetic diversity in nature, there remains a relatively stable set of core genes coding for the major redox reactions essential for life and biogeochemical cycles. Thus, microbial diversity does not necessarily entail diversity in proteins involved in metabolism.
It is hypothesized that there is limitless evolutionary diversity in nature. The rate of discovery of unique protein families has been proportional to the sampling effort, with the number of new protein families increasing approximately linearly with the number of new genomes sequenced.
“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.”
Microbial actions and human life are indisputably intertwined. Humans, however, are not wholly dependent the ecological catalysis of microbial life for the survival and proliferation of our species. Prokaryotic life catalyzes several important biogeochemical cycles globally, providing immense shifts in the redox states of nearly all biologically important compounds via their metabolic processes. Currently, both human activities and microbial metabolisms contribute to nutrient cycling on Earth, with microbes bearing most of the weight, but this balance is changing at an accelerating rate.
Organisms with consciousnesses have the ability to apply their efforts directionally towards a specific change. As humans, we can make plans while taking future conditions into account, we can cooperate towards a goal, and we can place a value on the success of individuals and species beyond the self. Microbes are bound by the forces of natural selection, and changes must be beneficial to an individual cell in order to be passed on to the next generation. Human consciousness gives humanity the capacity to make faster and more drastic changes to the biogeochemical landscape than microbial life.
Over the century since the industrial revolution, human industry, technology, and understanding of the universe has increased exponentially. Compared to the temporal scale that geological processes and shifts in net global microbial function occur, all of human progress is only a tiny blip on the tail end of history. Although human impact on biogeochemical cycles is minimal compared to that of prokaryotes in the present day, the pattern of exponential increase in humanity’s capacity to alter the environment, combined with recent emerging biotechnologies, makes for a compelling argument that microbes will become replaceable in the not so distant future.
The potential for human processes to excel beyond microbial processes is facilitated by consciousness. This emergent property gives humanity three main abilities that mitigate the normal evolutionary selection processes: the ability to cooperate towards a goal that may not immediately be beneficial to all contributors; the ability to take future impacts of actions into account when making current plans; and to place an innate value on the lives of humans and other species, both through compassion and through economic market forces. Together, this means that people can establish how a system functions, allocate resources towards a change in said system, and enact a plan not only to change the system in a specific direction, but to change it in what is determined to be the best direction. Conversely, changes to net prokaryotic population function, another emergent property, are determined solely by whether a small change is beneficial to the ability of an individual to reproduce. Natural selection places limits on the capacity of microbes to enact change on biogeochemical cycles compared to humans.
Both the maximum potential for change to biogeochemical cycles and the rate at which this potential can increase for microbes have boundary limits. The rate that the global, net prokaryotic metabolism can change is limited by the rate at which cells divide and the rate at which the global microbial genetic pool can be altered. Although the estimated number of prokaryotic cells on Earth is astronomically large, on the order of 1030, nearly all of them live in the terrestrial subsurface and have an average turnover rate on the scale of centuries (Whitman et al., 1998). Within geological time scales, this ‘silent majority’ is extremely active and relevant, but on the time scale of modern human environmental intervention, the division rate of these hidden cells makes their tremendous abundance much less consequential.
Cell division is also limited by energy availability. The primary input of energy to global biological systems is photosynthetic carbon fixation by higher plants and photosynthetic bacteria. The rate at which sunlight is transformed to an ecologically available energy source by a given photosynthetic population and the rate at which this energy can be disseminated to other organisms in the deep ocean and terrestrial sediment both place boundaries on the maximum global rate of microbial production. Global metabolic catalysis is dependent on the energy supplied to biological processes, and microbes have physical limitations to both maximum energy production and energy transfer between cells.
Beyond cellular division, genetic variation is necessary for changes to prokaryotic metabolic function, which in turn determines the ability for the global prokaryotic population to alter biogeochemical cycles. Horizontal gene transfer and the extremely high abundance of cells on Earth make useful mutations extremely common, even on small temporal scales. It is estimated that four simultaneous mutations occur in a cell every half hour, in the surface ocean alone (Whitman et al., 1998), however, this does not mean that the pool of available genes changes quickly. For a mutation to be heritable, it cannot be lethal. This presents a hard limit on the extent of change that can happen to genetic sequences in a single generation. Proteins vital to the survival of an individual cell, such as the metabolic enzymes relevant to many biogeochemical cycles, cannot be completely changed by mutation to a single cell in a single generation. Instead, functional diversity is the cumulative change to sequences over long time periods. Natural selection in a varied pool of random mutations is a system that strongly favours improvement to existing structures over the introduction of truly novel ones. The core set of proteins that carry out metabolic redox reactions which drive global biogeochemical cycles were developed extremely early in the history of life on Earth, and are still highly conserved (Falkowski et al. 2008). Microbial populations have boundary conditions that limit metabolic rate and functional change, imposed by both the processes of mutation and selection, and the rate of energy acquisition and distribution within a biological system. In the context of human activity, these boundaries have different limits, and may be able to be completely mitigated in the near future.
Recent history provides evidence of the potential for human activity to be the dominant controller of global nutrient cycles. Humanity has been raising the limits of energy acquisition and distribution since the first use of controlled fire, nearly 600 000 years ago (Berna et al., 2012). The ability to obtain and use energy more efficiently has increased along with the development of human civilizations. The first agriculture marked the beginning of a steady march toward increasingly efficient conversion of sunlight to available food sources, and made the first large-scale energy distribution network necessary: the transport and trade of food. The beginning of the industrial revolution marked the shift away from human bodies as the primary means of energy conversion from chemical to other forms. Vast canal systems for coal distribution made up the second, higher throughput energy distribution system. Finally, the discovery of electricity, along with the wide scale adoption of oil as a fuel source, ushered in the third generation of power production and distribution. In modern times, vast amounts of energy are produced by a ‘metabolism’ of human activity. Electrical and chemical energy are distributed along global networks of wires, pipes, and roads.
Energy availability becomes less of a limitation to maximum human impact on biogeochemical cycles every year. At the present date, human industry already rivals the magnitude of influence on nitrogen and carbon cycling by microbial metabolisms. Atmospheric carbon dioxide measurements show that the interannual increase in carbon dioxide due to anthropogenic combustion of fossil fuels, indicating that humanity already has to power to be the deciding factor in carbon cycling but does has not yet implemented directional control (NOAA, 2018). Likewise, the Haber process has allowed human activity to synthetically reduce massive amounts of nitrogen gas to ammonium for use in agriculture. At the turn of the millenium, humans produced about half of all nitrogen fixed annually, and this value has been increasing exponentially since the 1940’s (Rockstrom, 2009;Vitousek et al., 1997). Besides the conversion of organic matter to carbon dioxide, and the conversion of nitrogen gas to ammonium, human industry has the capacity to upset nearly any step in global biogeochemical cycles, should the current microbial processes become insufficient.
Man-made fixed nitrogen and carbon dioxide have both increased exponentially as global energy production has risen. However, the limitation of energy availability will not last much longer. A crude exponential fit of global energy production from 1820 to 2010 extrapolated to the year 4000 shows that humanity will consume the energy of our entire sun in just another 1800 years if production continues on the trend set since the industrial revolution (Fig. 1). In all likelihood, an element beyond energy availability, such as the maximum carrying capacity for human life on Earth, will set a new limit on human progress long before the need for a Dyson sphere is reached, but the key factor is that human energy production is virtually endless compared to the limited photosynthetic rate providing energy to microbial metabolism.
If the energy available to humans far outpaces that of microbes, the other factor at play is diversity of function and its rate of function. The human analog for the global microbial gene pool is the sum of human knowledge and available computational power. Computational power has increased exponentially since the first integrated circuits in the 1960’s. Transistor density has followed Moore’s Law by doubling every year, although this is expected to stop in the near future as transistor sizes become small enough for quantum effects to cause problems with keeping microcircuits closed (Chien and Karamcheti, 2013). Gallium and other alternatives to silicon are being explored to put off the end to Moore’s Law, but these are all just stopgaps and eventually transistor density must plateau due to physical limitations. However, computational power can still increase exponentially without an increase in transistor density, as long as there is enough available energy to fabricate and run more computer chips. Energy production is not likely to reach a maximum limit before Moore’s Law is terminated, meaning future computational power will be tied to energy production, a value that has been increasing exponentially for two centuries.
The human analogue to microbial genetic diversity is the diversity of technologies available. The sum of human knowledge has increased exponentially as energy availability and societal changes have allowed for greater resource allocation to research. Specifically, the science of microbiology is only in its infancy. Microbes were first observed a mere 400 years ago and medical microbiology exploded just 100 years ago. DNA was first imaged 70 years ago and molecular techniques in biology have become increasingly complex since then. Humanity’s collective understanding of how life functions, including how microbes impact geochemical cycles, has increased at an accelerating rate throughout all of human history. It is reasonable to expect that given increased energy and resources going forward in time, knowledge of biotic chemistry will continue to increase exponentially. All of human history is a raindrop in the ocean of time, where significant changes to genetic diversity and geologic equilibria have occurred. Furthermore, the period of time since people have begun to tease apart the intricacies of life on a microscopic scale is only a molecule of water in that raindrop. Right now, new human technologies can develop considerably faster than new microbial functions. Humanity is on the tipping point of making prokaryotes obsolete.
With the assumption that anthropologic energy production and knowledge of the universe will continue to increase exponentially into the next millennium, humanity is poised to make the metabolic catalysis of biogeochemical cycles by microbes unnecessary. Consciousness has allowed higher boundaries on rates of change to the environment for humans than for microbes, As evidenced by the current upsets to global carbon and nitrogen cycles. Potential for humans to alter biogeochemical processes will increase much faster than biological or geological systems will be able to adapt. The next millennium will mark the point where sufficient energy and technology will be available to humans to make prokaryotic processes antiquated and irrelevant.
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Rockstrom. 2009. A safe operating space for humanity. Nature. 461(24). DOI:10.1038/461472a
Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science. 320(1034). DOI:10.1126/science.1153213
Berna et al. 2012. Microstratigraphic evidence of in situ fire in the Acheulean strata of Wonderwerk Cave, Norther Cape province, South Africa. PNAS. 109(20)E1215-E1220. DOI:10.1073/pnas.1117620109
NOAA. 2018. Recent Monthly Mean CO2 at Mauna Loa.
Vitousek et al. 1997. Human Domination of Earth’s Ecosystems. Science. 277(5325): 494-499. DOI: 10.1126/science.277.5325.494
Vlachogianni and Valavanidis. 2013. Energy and Environmental Impact on the Biosphere Energy Flow, Storage and Conversion in Human Civilization. Science and Education Publishing.
Chien and Karamcheti. 2013. Moore’s Law: The First Ending and a New Beginning. Computer. 46(12):48-53. DOI: 10.1109/MC.2013.431
In 2002, values up to 500000 were discussed.in 2016 values of millions to trillions were presented. Only 20% of prokaryotes are represented by cultured species.
Thousands available just from EBI and thousands more from hundreds of other sources. The main biomes sequenced are soil, human digestive tract, marine, and freshwater, but metagenomics projects exist for almost every conceivable niche environment.
Metagenome databases: EBI,NCBI
Marker gene databases: SILVA,Greengenes
Phylogenetic for taxonomic trees. Lots of vertical transfer, single copy. Make nice trees based on sequence. Evolve over time.
Functional for characterization of biogeochemical processes. Lots of horizontal transfer because genes are desirable. Some genes are easier to transfer than others (needs more genes for pathway, makes sense to produce in different organism?). Good for looking at at what is happening now, rather than change over time. Many have multiple copies that can be different sequence.
It is the separation of sequences into groups that are all similar to each other on some basis. This can be done by GC content or contig length. Bins may not represent the actual organisms present in a species because some will be thrown out due to low completeness or high contamination. Organisms that are similar to each other may be binned together.
Singe cell genome sequencing. A cell is isolated from a sample using a flow cytometer then its genome is carefully amplified by PCR and sequenced.
Third gen sequencing. Single molecule sequencing can sequence whole genomes without the need to bin or assemble genomes. Some examples are Oxford Nanopore and PacBio.
Martinez A et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. PNAS. 104(13):5590-95. DOI: 10.1073/pnas.0611470104
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities
Candy community counting
In order for a sample to be diverse, individuals must be divided into groups based on some sort of differentiation. Species definition is how these groups are chosen. Bothe the Simpson Index and Chao1 richness values change if the number of species changes. This means that if your species definition is more granular and has a higher taxonomic resolution, a sample will appear more diverse than the same sample analysed using a different definition of what a species is.
We did not draw any differences based on colour, deciding that different colours marked different strains within the same species. If we dividied species by colour as well as brand we would have many more species in the same community. Alternatively, we could have grouped all gummy candies together and all chocolate, or all round candies together to end up with less total species. Certain very specific changes could have also been used to raise our species count such as separating dark chocolate M&M’s from milk chocolate into two species.
Sanger and Illumina sequencing both use PCR befor sequencing, whether for raising the template concentration or cluster generation. PCR has can introduce changes to the sequence when DNA taq polymerase mismatches a base, but these do not make it through to the final sequence because base calling in both systems is based off an integrated value of many molecules at once. However, if an incorrect sequence was generated in an early PCR cycle, it could continue to be replicated and evetually compose a sizeable portion of the final sequence pool being observed. This could cause incorrect sequences. Another problem that PCR introduces is chimera generation. Sequences can recombine part way through replication and create new, hybrid sequences containinng part of two other original sequences. This new sequence will be replicated and eventually sequenced. Once sequences are obtained and it is time to try to bin into species, PCR error will blur the edges of similar sequences, even if they should all be contained in one species. More importantly, chimeric sequeces will appear as completelydifferent organisms and drastically raise the amount of species in a sample and therefore the diversity of the sample. Third generation sequencing based on single molecules, such as Oxford Nanopore, eliminate the issues introduced by PCR because it is no longer a necessary part of the sequencing process.
“Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?”
The definition of “species” used in macroscopic biology is useless when applied to prokaryotes, as the definition relies on sexual reproduction producing viable offspring to determine whether two organisms belong to the same species. Currently, there is no clear definition of what a species is for prokaryotic organisms, due to a plethora of compounding reasons. Morphological differences present in prokaryotes are not as diverse as the global pool of microbial organisms, and the marker genes used in genetic definitions do not always retain their sequence across a whole species. On the other hand, there may not be enough differentiation in marker sequence even at the genus level to usefully separate organisms into appropriate species. Horizontal gene transfer (HGT) complicates the issue even further, as copies of DNA segments can spread across species and even genera, throwing a wrench into the workings of any species definition based on whole genome similarity. HGT frustrates the species definition aspect of microbiology, but it plays an important role in the maintenance of biogeochemical cycles. Environmental conditions may change such that a particular species will die out, but a horizontally transferred gene and its environmental function can persist in another species that is perhaps better suited to this new environment. Perhaps it is not even necessary to have a clear definition of prokaryotic species, outside of medical microbiology, because specific functions are what are important to humans. The net function of a microbial community matters more than which specific constituents make up the community. Tracking the evolution and diversity of genes over time may be more useful for environmental applications than species as it allows for a direct observation of function.
Traditionally, multicellular organisms were classified into the same species if their offspring were fertile. For microbes, who reproduce asexually by dividing one organism into two new ones, this definition completely misses the mark. Any attempt to classify prokaryotes by morphological features is also futile, as the range of diagnostic physical characteristics in unicellular organisms is just too small. Instead, modern microbiology relies on genetic similarities to classify organisms into species. Unfortunately, this genetic system is also fraught with problems. Different methods of determining genetic similarity are used by different researchers, including hybridization temperature, average nucleotide similarity across a genome, sequence similarity in a gene conserved across taxa, and sequence similarity in just one region of a gene (Kim et al., 2014). Since all of these definitions rely on some threshold value of similarity to determine whether a given sequence falls into one or another species, these arbitrarily-chosen threshold values provide another point of variation in how species are defined by different researchers. There is a fundamental tradeoff between taxonomic resolution and inclusion of organisms that would not normally be considered part of a certain species, when these thresholds are changed. Sequencing error and chimera generation can provide a source of false species to all genetic species definitions, limiting their utility as a system for determining diversity (Kunin et al., 2010).
A final source of trouble when determining a single, clean definition of a microbial species is HGT. When segments of DNA are exchanged between cells, relatively large chunks of sequence can be integrated into a new cell’s genome. Most researchers would say that an organism should still belong to the same species if all that has changed is the acquisition of a handful of additional genes among thousands of others. However, the functional capabilities of an organism can change dramatically, depending on which genes are incorporated (Martinez et al., 2007). Even commonly used taxonomic marker genes such as the 16S rRNA gene may be transferred horizontally (Wooley et al., 2010). One possibility of a species definition that could account for HGT would be using a metric of how many genes are shared between individuals. However, this option does not solve the issue of defining separation thresholds, but rather pushes it to the gene level instead of the organism level.
Genes can persist within an environment on much longer timescales than prokaryotic species. When environmental conditions change, selective forces acting on microbial communities will also change and some species will die off or be reduced to extremely low abundance. The process of HGT allows organisms that are suited to life under the new environmental conditions to inherit biogeochemically important genes from dying species and expand to fill the newly absented niche. Even though species have been in a constant cycle of creation through mutation and removal by selection for over 4 billion years, the same core set of redox metabolic genes has been retained for nearly the full length of this period (Falkowski et al., 2008). HGT allows genes to evolve and function as autonomous entities, rather than components of an organism whose success is constrained by the overall organism’s fitness.
Medical microbiology has a longer history of species definition than environmental microbiology, and most established species in this field are defined via assayable chemical functions or physical characteristics. Function is still the attribute that is most important and pathogenicity islands can be transferred horizontally to new species that may not be identified by tests designed for the species originally carrying the genes. Exact definitions of whether an organism falls into one species or another are not medically important, only whether or not the organism causes illness. Historically, diversity has not been an important factor in medical microbiology, although the gut microbiome now bridges the fields of environmental and medical microbiology. A rock-solid species definition is not important if diversity does not matter. Because diagnosis is often time sensitive, genetic species definitions will not be useful in medical applications until sequencing and sequence data analysis can give results in the same amount of time as test strips designed to identify a certain species. In environmental microbiology, results are not quite as time sensitive and diversity of function and how it fluctuates spatiotemporally are important factors, so some unit of unique life needs to be defined.
Perhaps a clear species definition is not necessary if genes provide more pragmatic measurements of diversity, distribution, and potential function. Rather than arbitrary sequence identity thresholds to determine binning into what counts as a different gene, the definition could be primarily focussed on enzymatic function, with sequence similarity considered using a large threshold value as only a broad preliminary filter. Portions of enzymatic pathways are often lost to single genomes due to genome streamlining (Giovannoni, 2017), but the function of the whole pathway will be retained, as genes lost to a single organism will be distributed across the community (Morris et al., 2012). Combined with the distribution of similar and identical genes through HGT, gene diversity and distribution within a community provide a much more comprehensive view of environmental function than species observations.
Prokaryotes do not fit the multicellular system for species identification and no alternative has been universally agreed upon. Morphological features are insufficiently diverse, unless only a small pool of possible organisms is considered for identification, as with medical microbiology. Genetic systems of classification require a standardized portion of the genome to be observed, a standardized method of similarity measurement, and a standardized threshold for where the division between species lies. Sequencing error and horizontal gene transfer further complicate genetic species definitions as genome sequence does not follow a single path of phylogenetic inheritance to a last common ancestor. HGT provides a process that allows environmentally important genes to persist on a much longer timescale than the survival of a single species. This transfer of important genes highlights the significance of functions of specific genes and the disconnect between function and any definition of species. Defining genes as the smallest unit of life provides increased taxonomic resolution while maintaining a direct relationship to observed biogeochemical functions.
Please see “Project_01_report.html” in my portfolio directory
Welch et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. PNAS. 99(26). [DOI: 10.1073/pnas.252529799] (https://www.ncbi.nlm.nih.gov/pubmed/12471157)
Kim et al. 2014. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol. 64(Pt 2):346-51. DOI: 10.1099/ijs.0.059774-0
Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science. 320(1034). DOI:10.1126/science.1153213
Giovannoni SJ. 2017. SAR11 bacteria: The most abundant plankton in the oceans. Ann Rev Mar Sci. 9(1):231-255. DOI: 10.1146/annurev-marine-010814-015934
Martinez et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci USA. 104(13):5590-5595. DOI:10.1073/pnas.0611470104
Wooley, Godzik, Friedberg. 2010. A primer on metagenomics. Comput Biol. 6(2):e1000667. DOI:10.1371/journal.pcbi.1000667
Kunin et al. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 12(1):118-23. DOI: 10.1111/j.1462-2920.2009.02051
Morris, Lenski, Zinser. 2012. The black queen hypothesis: evolution of dependencies through adaptive gene loss. 3(2):e00036-12. DOI:10.1128/mBio.00036-12
Please see “Project_02.html” in portfolio directory